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ABSTRACT 

Two  pitch  perception  modeling  algorithms  are  described.  The  first  algorithm 
models  “periodicity”  pitch  perception,  and  the  second  algorithm  models  “place”  pitch 
perception. 

The  two  models  are  now  applied  to  various  psychoacoustic  stimuli.  Both 
periodicity  and  place  models  yield  results  that  are  in  general  agreement  with  psychoacoustic 
measurements  for  the  missing  fundamental  and  for  inharmonic  stimuli.  The  place  algo¬ 
rithm  proved  to  be  a  better  approximation  than  periodicity  for  processing  comb-filtered 
noise.  Periodicity  was  more  successful  for  periodic  pulse  train  stimuli. 
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1.  REVIEW  OF  SOME  ASPECTS  OF  PSYCHOACOUSTICS 


1.1  THE  MISSING  FUNDAMENTAL 

When  a  listener  matches  a  pure  tone  (i.e.,  sine  wave)  to  a  complex  tone  consisting  of  a  set  of 
harmonically  related  sum  of  sinusoids  (e.g.,  harmonics  3,  4,  and  5  of  some  fundamental  frequency),  the 
match  will  take  place  at  /,  which  is  the  missing  fundamental  frequency  of  the  complex  tone. 

1.2  PITCH  OF  INHARMONIC  SIGNALS 

When  a  listener  is  asked  to  match  the  sum  of  sinusoids  at  frequencies  3/,  4 /,  and  5/ to  a  pure  tone, 
that  match  will  occur  at  /.  What  happens  if  each  of  the  above  three  sinusoids  is  shifted  in  frequency  by 
A/?  DeBoer  [1]  performed  such  an  experiment,  as  did  van  den  Brink  [2]  and  Smoorenburg  [3],  This 
research  showed  that  subjects  were  still  able  to  perceive  pitch  despite  the  inharmonicity  of  the  signal. 

1 3  PITCH  OF  REPEATED  NOISE 

Miller  and  Taylor  [4]  discovered  that  pitch  could  be  perceived  when  the  listener  was  presented  with 
repeated  bursts  of  noise.  More  current  research  found  if  white  noise  was  comb-filtered  (i.e.,  the  output 
is  the  sum  of  the  input  and  a  delayed  version  of  the  input),  a  pitch  of  1/7"  was  perceived  [5-8].  The 
stimulus  “sounds  like”  noise;  however,  when  T  is  systematically  varied  such  that  MT  steps  through  the 
frequencies  corresponding  to  the  seven  notes  of  the  major  scale,  the  pitch  of  these  notes  is  heard  [9]. 

1.4  PITCH  PERCEPTION  OF  PULSE  TRAINS 

Flanagan  and  Guttman  [10]  discovered  two  distinct  modes  of  pitch  perception  for  periodic  pulse 
train  stimuli.  To  quote  their  paper,  “In  the  first  mode,  for  pulse  rates  less  than  100  pps,  the  pulse  trains 
are  ascribed  a  pitch  equal  to  the  number  of  pulses  per  second,  regardless  of  the  polarity  pattern  of  the 
pulses.  In  the  second  mode,  for  fundamental  frequencies  in  excess  of  200  Hz,  the  sounds  are  assigned 
a  pitch  equal  to  the  fundamental  frequency.” 

1.5  CIRCULARITY  IN  JUDGMENTS  OF  RELATIVE  PITCH 

Shepard  [11]  demonstrated  that  for  specialized  signals  consisting  of  the  sum  of  tones  separated 
by  octaves  (e.g.,  150,  300,  600,  or  1200  Hz,  etc.)  listeners  will  often  identify  an  increase  in  the  tone 
frequencies  as  a  lowering  of  pitch.  On  the  average,  if  all  tone  frequencies  are  increased  by  less  than  one- 
half  octave,  the  new  stimulus  is  judged  to  be  higher  in  pitch  than  the  old  one.  If,  however,  all  tone 
frequencies  are  increased  by  more  than  one-half  octave  (but  less  than  one  octave),  the  new  stimulus  is 
judged  to  be  lower  in  pitch.  Pollack  [12]  viewed  this  result  as  a  further  example  of  the  decoupling  of 
auditory  pitch  and  stimulus  frequency.  Through  a  series  of  experiments  he  identifies  important  parameters 
of  this  phenomenon  as  “the  number  of  components  in  the  signal,  the  number  of  offsetting  frequencies 
which  are  weighted  against  the  direction  of  Shepard  pitch  and  perhaps  the  spacing  between  the  components.” 
Deutsch  [13]  has  discovered  other  interesting  properties  of  “Shepard  pitch.”  For  example,  a  tone  pattern 
can  be  heard  as  ascending  when  played  in  one  key  and  descending  when  played  in  another. 
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2.  DESCRIPTION  OF  PERIODICITY  AND  PLACE 
PROGRAMS  TO  MODEL  PITCH  PERCEPTION 


2.1  PERIODICITY  PITCH 

Figure  1  shows  block  diagrams  of  both  periodicity  and  place  algorithms  for  pitch  detection.  The 
periodicity  algorithm  assumes  a  correspondence  between  the  basilar  membrane  and  the  filter  bank  and 
a  further  correspondence  between  the  hair  cell-auditory  nerve  complex  (on  the  one  hand)  and  elementary 
pitch  detectors  EPD,  through  EPDM.  The  filter  designs  are  based  on  physiological  measurements  of 
Delgutte  [14],  At  present,  19  filters  have  been  implemented;  the  frequency  covers  a  2-kHz  range.  The 
ability  of  these  filters  to  resolve  harmonics  is  a  function  of  the  pitch  and  formant  structure  of  the  incoming 
signal;  thus,  a  given  filter,  representing  a  specific  place  on  the  basilar  membrane,  can  sometimes  perform 
a  place  function  and,  at  other  times,  the  same  filter  can  perform  a  periodicity  function. 
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Figure  l.  (a)  Block  diagram  for  the  periodicity  algorithm  and 
(b)  block  diagram  of  the  two-stage  place  algorithm. 
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Neural  spiking  tends  to  follow  the  peaks  of  the  signal.  Given  an  auditory  nerve  spike,  that  same 
nerve  cannot  respond  to  further  stimulation  for  a  time  immediately  following  the  spike;  this  is  called  the 
refractory  period.  Following  this  period,  the  potential  differences  inside  the  neuron  gradually  return  to 
normal,  thus  steadily  increasing  the  probability  of  subsequent  firings. 

The  global  algorithm  shown  in  Figure  1(a)  further  processes  the  succession  of  time  intervals 
between  spike  occurrences.  The  algorithm  is  based  on  the  hypothesis  that  the  higher  auditory  centers  can 
interpret  intervals  separated  by  several  spikes  to  produce  five  monotonically  increasing  intervals.  Thus, 
at  any  instant,  each  elementary  pitch  detector  (EPD)  presents  five  numbers  to  the  global  detector.  Each 
filter  output  excites  two  EPDs;  the  positive  set  of  EPDs  spikes  on  positive  excursions  of  the  signals,  while 
the  negative  set  spikes  on  negative  excursions.  Thus,  there  are  2  x  19  =  38  EPDs,  each  producing  five 
intervals  so  that  there  are  5  x  38  =  190  intervals  available.  The  final  global  periodicity  decision  is 
obtained  by  developing  a  histogram  of  these  190  intervals  and  choosing  the  mode  or  maximum  of  the 
resultant  probability  density  function. 

2.2  PLACE  PITCH 

The  underlying  hypothesis  of  place  detection  is  the  ability  of  the  auditory  system  to  resolve  enough 
harmonic  peaks  of  the  stimulus.  This  resolution  can  take  place  at  the  periphery  or  at  higher  levels.  In 
fact,  Houtsma  and  Goldstein  [15]  have  shown  that  centrally  located  auditory  processes  can  indeed  perceive 
pitch  even  if  each  ear  is  subjected  to  a  single  harmonic  of  the  fundamental  frequency.  Figure  1(b)  shows 
an  implementation  of  a  high-resolution  spectral  analysis  via  a  4096-point  fast  Fourier  transform. 

A  reliable  algorithm  that  leads  to  the  Goldstein  algorithm  [16]  can  be  implemented  as  a  two-stage 
process.  Stage  1  is  a  version  of  the  Seneff  algorithm  [17]  that  performs  a  statistical  analysis  of  the 
frequency  separation  between  spectral  peaks.  Stage  2  is  related  to  the  “harmonic  sieve”  algorithm  of 
Goldstein’s  pitch  perception  model,  as  implemented  by  Duifhuis  [18].  The  spectral  peaks  are  correlated 
with  sets  of  harmonically  spaced  narrow  windows.  The  chosen  sets  are  based  on  the  “winning  pitch”  of 
stage  1.  If  the  winning  pitch  of  stage  1  is  f  the  chosen  sets  include  / ±25  Hz.  Thus,  the  results  of  stage  1 
greatly  restrict  the  range  of  measurements  of  stage  2.  The  combination  avoids  many  ambiguous  pitch 
results. 
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3.  PRELIMINARY  MODEL  RESULTS  FOR  PSYCHOACOUSTIC  STIMULI 


3.1  PITCH  OF  INHARMONIC  SIGNALS  (SHIFT  OF  VIRTUAL  PITCH) 

Figure  2  shows  several  cases  for  both  place  and  periodicity  model  responses  to  inharmonic  stimuli. 
Since  people  perceive  pitch  despite  the  absence  of  harmonic  structure  at  the  pitch  frequency,  the  term 
"virtual  pitch"  is  used  to  describe  the  results.  Figures  2(a)  and  (c)  show  the  model  results  for  different 
harmonic  structures.  Figure  2(b)  shows  a  comparison  between  the  human  response  (dashed  line)  and  the 
two  models.  It  appears  that  both  place  and  periodicity  models  respond  similarly.  Also,  both  models 
respond  qualitatively  in  the  same  manner  as  the  human.  Because  of  our  10-kHz  sampling  ra'e,  the  model 
results  are  quantized. 
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Figure  2.  Shift  of  Virtual  Pitch,  (a)  Periodicity  pitch  track  for  a  stimulus  consisting  of 
harmonics  l  and  2.  (b)  three  pitch  curves  for  a  stimulus  consisting  of  harmonics  3.  4.  and 
5,  and  (c)  a  stimulus  consisting  of  harmonics  2  through  5  inclusive. 
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3.2  PITCH  OF  COMB-FILTERED  NOISE 


Figure  3  shows  how  the  models  respond  to  comb-filtered  white  noise.  Ten  delays  were  imposed; 
they  ranged  from  12.0  to  2.1  ms.  Each  of  the  comb-filtered  signals  is  processed  by  both  models  for  160  ms 
and  then  followed  by  a  pause  of  55  ms.  Figure  3  shows  the  pitch  period;  the  dips  in  period  are  due  to 
the  “off’  parts  of  the  signal.  The  place  algorithm  follows  the  results  obtained  from  psychoacoustics  for 
the  initial  six  of  the  ten  cases.  Periodicity  also  tends  to  follow  this  pattern  but  much  less  reliably. 
Interestingly,  however,  the  higher  pitches  are  better  represented  by  the  periodicity  models.  A  speculative 
hypothesis  could  attribute  lower  pitch  results  to  a  place  model  and  higher  pitch  results  to  a  periodicity 
model. 
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Figure  3.  Pitch  of  comb-filtered  noise,  which  shows  the  tracks  of  the  detected  periods 
from  both  place  and  periodicity  pitch  detection  algorithms. 


3.3  TRANSITION  FROM  RATE  TO  FUNDAMENTAL  FREQUENCY 

Figure  4  shows  the  results  of  the  periodicity  pitch  model.  A  single  period  of  the  repetitive  stimulus 
is  shown  in  the  box  at  the  upper  left.  Every  0.18  s,  the  pulse  rate  r  is  increased  in  steps.  The  fundamental 
frequency  /  is  always  r/4.  Periodicity  pitch  follows  the  rate  until  r  =  300  pulses/s  and  then  abruptly 
switches  to  follow  the  fundamental.  This  is  quite  analogous  to  the  behavior  of  human  listeners  to  the 
same  stimulus.  For  this  stimulus,  the  periodicity  model  appears  to  work  properly,  but  the  place  model 
does  not. 

3.4  CIRCULARITY  IN  PITCH  PERCEPTION 

Figure  5  shows  the  pitch  periods  generated  by  the  periodicity  model  for  various  fundamental 
frequencies  of  the  Shepard  stimuli.  Figure  5  shows  that  this  result  is  consistent  (for  the  periodicity  model) 
over  an  octave  range  of  lowest  tones.  The  place  model,  on  the  other  hand,  appears  to  yield  completely 
ambiguous  results. 
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Jean-Claude  Risset  experimented  with  a  complex  Shepard  pitch  signal  consisting  of  ten  compo¬ 
nents,  each  of  which  descends  ten  octaves  but  are  perceived  together  as  an  endless  giissando  or  pitch  slide 
that  remains  within  a  single  octave  register  Risset  [19-21],  Figure  6  shows  the  responses  of  both 
periodicity  and  place  models  to  Risset’s  “Endless  Giissando.”  Both  models  remain  within  the  single 
octave  during  the  cycle  with  the  exception  of  an  ambiguous  region  for  the  periodicity  model  near  the  half 
octave  point.  The  place  algorithm  is  not  ambiguous  and  seems  to  track  the  physical  stimulus  well.  This 
is  in  contrast  to  its  response  to  Shepard  tones.  A  possible  reason  for  this  apparent  discrepancy  is  that 
in  our  version  of  the  Shepard  tones  all  harmonics  were  equal,  while  our  version  of  the  Risset  stimulus 
included  his  amplitude  window.  Future  experiments  may  shed  further  light  on  this  issue. 
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Figure  4.  Rate  versus  frequency  plot  from  the  periodicity  pitch  algorithm  for  the 
Flanagan-Guttman  pulse  stimulus. 


6 


PERIOD  (ms) 


204855  5 


DETECTED  PITCH 

200  217  240  129  138  149  158  169  178  188  196 


100  110  120  130  140  150  160  170  180  190  200  210 

LOWEST  TONE  (Hz) 


7.0  b 


.  r 

s  10 1 _ 


i - 1 - r 


i - 1 - 1 - 1 - 1 - r 


PERIODICITY 


J _ I _ I _ L 


O 

o 

DC 

111 


0.2 


TIME  (s) 


1.4 


12.0  b 


TIME  (s) 


Figure  5.  Comparison  of  periodicity  and  place  algorithms  for  pitch  circularity  using  a  Shepard 
tone  complex.  The  one -half  octave  shift  (circularity)  described  in  the  psychoacoustics  literature  is 
contrasted  to  the  arbitrary  response  from  the  place  algorithm  to  the  same  stimuli. 
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Figure  6.  Periodicity  and  place  model  responses  to  Risset's  "Endless  Glissando. 
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4.  SUMMARY  AND  CONCLUSIONS 


Two  models  of  pitch  perception  have  been  implemented,  and  the  response  of  these  models  to 
various  psychoacoustic  stimuli  have  undergone  preliminary  study.  Both  models  successfully  track  the 
pitch  of  a  harmonic  signal  with  missing  fundamentals.  The  periodicity  model  corresponds  to  psychoacoustic 
results  from  human  listeners  for  inharmonic  stimuli,  periodic  pulse  train  stimuli,  and  Shepard  stimuli.  On 
the  other  hand,  the  place  model  corresponds  to  psychoacoustic  results  for  inharmonic  stimuli,  comb- 
filtered  noise,  and  nonsimultaneous  harmonics.  These  results  can  help  psychophysicists  speculate  on 
auditory  nerve  functions  above  the  periphery,  including  a  possible  mechanism  that  might  combine  the 
optimal  performance  of  both  models. 
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